Online-Academy
Look, Read, Understand, Apply

Data Mining And Data Warehousing

Outlier finding Clustering Algorithm

Clustering algorithms can be used to detect outliers (anomalies) by identifying data points that don’t fit well into any cluster or are distant from cluster centers or dense regions. Here's a step-by-step explanation of how this works across various clustering techniques:
Outliers are data points that:
  • Do not belong to any cluster
  • Belong to very small or sparse clusters
  • Are far away from their assigned cluster centroid or core region
DBSCAN Clusters are formed based on density (minimum number of points within a radius eps of the core point). Outliers: Points that don't belong to any cluster (labeled -1), that are not density reachable from any cluster are automatically considered outliers.
K-Means Clusters are formed around centroids that minimize intra-cluster variance. Outliers: Calculate the distance of each point to its assigned centroid, points with in specified distance (threshold) belong to the cluster, Points with distances above a certain threshold are considered outliers.